Topic-based language models using EM

نویسندگان

  • Daniel Gildea
  • Thomas Hofmann
چکیده

In this paper, we propose a novel statistical language model to capture topic-related long-range dependencies. Topics are modeled in a latent variable framework in which we also derive an EM algorithm to perform a topic factor decomposition based on a segmented training corpus. The topic model is combined with a standard language model to be used for on-line word prediction. Perplexity results indicate an improvement over previously proposed topic models, which unfortunately has not translated into lower word error.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Discriminative EM-Based Semi-Supervised Learning algorithms on Agreement/Disagreement classification

Recently, semi-supervised learning has been an active research topic in the natural language processing community, to save effort in hand-labeling for data-driven learning and to exploit a large amount of readily available unlabeled text. In this paper, we apply EM-based semi-supervised learning algorithms such as traditional EM, co-EM, and cross validation EM to the task of agreement/disagreem...

متن کامل

Semantics-based Language Models for Information Retrieval and Text Mining

Semantics-based Language Models for Information Retrieval and Text Mining Xiaohua Zhou Xiaohua Hu The language modeling approach centers on the issue of estimating an accurate model by choosing appropriate language models as well as smoothing techniques. In the thesis, we propose a novel context-sensitive semantic smoothing method referred to as a topic signature language model. It extracts exp...

متن کامل

A novel language model based on self-organized learning

Statistical language model is very important to speech recognition. To a system of special topic, domain dependent language model is much better than general model. There are two problems in traditional method to train topic dependent model: 1. The corpus of special topic is not as enough as general corpus. 2. An individual article always relates to more than one topics, traditional method has ...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models

Topic models have become increasingly prominent text-analytic machine learning tools for research in the social sciences and the humanities. In particular, custom topic models can be developed to answer specific research questions. The design of these models requires a nontrivial amount of effort and expertise, motivating general-purpose topic modeling frameworks. In this paper we introduce lat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999